Motivation

TODO

Methodology

Initial exploration: Cross Correlation

Our first approach to this DAP looked at the relationship between the indicator and the signal more generally. We first used cross correlation analysis on the time series to identify the relationship between indicators and cases across a time period. For two time series \(y, x \in \mathbb{R}^T\), cross-correlation is defined as:

\[\max_{i} Corr(y_{i+1,\cdots, T}, x_{1, \cdots, T-i}),\] and measures the maximum Pearson correlation between the two as a result of lagging one by the amount of \(i\).

We calculated the cross-correlation and the optimal lag in each county. An example of this data over all observed counties for the Drs Visits indicator signal:

An example of this data over all observed counties for the Drs Visits indicator signal: cross_correlation_plot

This exploratory analysis hints at a predictive relationship; that indicators, observed in advance, can have high correlation with case counts. However, this is a more general analysis than our project’s goal, which is dealing with periods of increase specifically, rather then the general relationship between two signals. The main analysis in the project aims for a more specific focus on sharp upswings in cases, something significant within a county, and of particular public health importance, as well as of practical importance for modeling and forecasting. We want to analyze the leading-ness of a signficant indicator rise in relation to a significant case rise. To do this, we need to define and find periods of significant rise in these signals.

Identifying Rises in Signals

In order to ascertain leading-ness of an indicator, we want to determine whether the indicator began to rise significantly before cases began to rise significantly.

As a core component of this analysis, we need methodology to accurately identify significant rises, given a single time series of an indicator. This is non-trivial, since the data is quite noisy at the county level, and clean rise/drops are rare.

Starting at the Peak
At first we experimented with finding the peak of a signal in a given time period and identifying the closest local minimum that precedes the peak. However this is not always the point at which the signal actually begins to rise (could be caught in a shallow local minimum), and does not gaurantee the rise would be a lengthy or a steep one. This method also only picks one rise period for a signal for every county for the given time period, which isn’t always reflective of the signal’s actual behavior.

Best Fit Line
One option we tried was calculating a line of best fit for the signal for fixed time periods within a larger time period. For example, calculating a line of best fit for every 21 day window within a 3 month window and choosing the period that has the highest slope as the most significant rise period in that county for that signal.

  • Where this worked well:
    • This method finds periods of consistent rise, not allowing the signal to vary a lot between the beginning and ending points of the rise period.
  • Where this had drawbacks:
    • Many times this method only selects the larger end of a rise, since the window is a fixed size, and leaves out where the rise starts.
    • This method also only allows for one rise per time period/county.

Example plot

best_fit_plot


Estimated Derivative
We then tried using multiple different derivative estimate methods to identify periods where the estimated derivative at each point is over a certain threshold.

  • Where this worked well:
    • It can find smaller periods of continuous rise that are flexible in length.
    • It allows for multiple periods of rise per time period.
  • Where this had drawbacks:
    • Using a large fixed window size (14 or 21 days) in the derivative estimation function means that many identified points are actually within a decreasing period after a rise.
    • It can label rises too liberally, identifying very small bumps in the signal as rises.

Example plot

Blue is using smoothing spline method, Red is using local linear regression (Purple is where both methods marked the same point) estimated_deriv_plot


Final Method

We saw that smoothing the signal first using smoothing splines (in addition to the 7-day average smoothing already applied to the data, e.g. 7-day average CLI) and using the derivative method produced the best results. Twekaing this method with some other decision rules gave us our best outcome for finding periods of significant rise.

Final criteria for rise periods: A period is a significant rise in a smoothed signal if

  1. First derivative at each point is > 0 - this means the signal is in fact rising on every day

  2. Period is > a certain number of days (for this analysis we used TODO) - this means the rise is not spurious

  3. Each first derivative is > a certain % of other derivatives in time period (Note, for this analysis we set this to 0%, effectively not using this parameter) - if not set to 0, it can mean the rise is a significant one for this county but also ties this decision to the specific time period we are looking at. The rise point identifications can change based on the time period if this is set greater than 0.

  4. Magnitude of increase from start to end of period is > a certain threshold (for this analysis we used TODO) - this is another way to make sure that the rise is significant, not just a slight uptick in cases

Finally, we take the point at the beginning of each rise period as the best estimation of a point of inflection where a signal begins to rise significantly, so we can address the question: Does the beginning of a rise in the indicator come before the beginning of a rise in cases?

In our analysis, we look at this on a county by county basis, for specific time periods during the pandemic, like the “Summer wave” or the “Fall wave.” TODO is this still true?



County Selection

In our analysis, we include all counties that have greater than 2000 cases (a little over 20 cases a day for a 3 month window), 80 days of indicator data for a 3 month window, and do not have zero or negative values for either cases or the indicator. TODO is this still true?

Recall and Precision

TODO


Walkthrough

In this section, we describe our pipeline for processing, plotting and analyzing the data using the methodology described above.

Step 1. Get and prepare county data

As an example, we’ll use our Dr Visits % CLI as our indicator, and the summer as our time period. We use our LeadingIndicatorTools package for all our main functions.

drs_visits_prepared_summer = get_and_parse_signals("2020-06-01", "2020-8-31", "doctor-visits", "smoothed_adj_cli", 2000, 80)
## Warning: The `...` argument of `group_keys()` is deprecated as of dplyr 1.0.0.
## Please `group_by()` first


Step 2. Plot the Drs Visits and the case signal together for an example county.

drs_summer = get_increase_points(drs_visits_prepared_summer$cases, drs_visits_prepared_summer$indicator)
plot_signals(drs_summer, "01003", smooth_and_show_increase_point=FALSE, "Drs Visits")


Step 3. Mark the rise points (the points at the beginning of the rise periods) for the Drs Visits and Cases signal.

In the respective rise point columns, the day is marked with a 1 if it is found to be a rise point for that signal. We can see here that there is a rise point for Drs Vists on 6/18 and for cases on 6/26.

drs_summer[1]
## [[1]]
##    time_value geo_value case_value ind_value case_rise_point
## 1  2020-06-01     01003   2.285714  2.591397               1
## 2  2020-06-02     01003   2.142857  2.057542               0
## 3  2020-06-03     01003   1.571429  1.621858               0
## 4  2020-06-04     01003   1.714286  1.472914               0
## 5  2020-06-05     01003   2.000000  1.672025               0
## 6  2020-06-06     01003   3.000000  1.778825               0
## 7  2020-06-07     01003   3.571429  1.885602               0
## 8  2020-06-08     01003   4.000000  2.008342               0
## 9  2020-06-09     01003   4.714286  2.300228               0
## 10 2020-06-10     01003   5.571429  2.182529               0
## 11 2020-06-11     01003   7.142857  1.955341               0
## 12 2020-06-12     01003   8.142857  1.869405               0
## 13 2020-06-13     01003   8.142857  1.710156               0
## 14 2020-06-14     01003   7.285714  1.601059               0
## 15 2020-06-15     01003   9.000000  1.517344               0
## 16 2020-06-16     01003   9.142857  1.332368               0
## 17 2020-06-17     01003   8.714286  1.107573               0
## 18 2020-06-18     01003   8.285714  1.351734               0
## 19 2020-06-19     01003   8.571429  1.745868               0
## 20 2020-06-20     01003   8.428571  2.264708               0
## 21 2020-06-21     01003   9.428571  2.834244               0
## 22 2020-06-22     01003   7.714286  3.241748               0
## 23 2020-06-23     01003   8.714286  3.510611               0
## 24 2020-06-24     01003  10.285714  3.674556               0
## 25 2020-06-25     01003  10.857143  3.581878               0
## 26 2020-06-26     01003  14.571429  3.290597               0
## 27 2020-06-27     01003  19.285714  3.585068               0
## 28 2020-06-28     01003  20.714286  3.962742               0
## 29 2020-06-29     01003  29.428571  4.295182               0
## 30 2020-06-30     01003  32.857143  4.992063               0
## 31 2020-07-01     01003  34.142857  5.485732               0
## 32 2020-07-02     01003  39.142857  5.911132               0
## 33 2020-07-03     01003  47.142857  6.084344               0
## 34 2020-07-04     01003  44.000000  6.427596               0
## 35 2020-07-05     01003  43.714286  6.822692               0
## 36 2020-07-06     01003  38.285714  7.248379               0
## 37 2020-07-07     01003  45.285714  8.381805               0
## 38 2020-07-08     01003  50.428571  8.751762               0
## 39 2020-07-09     01003  54.285714  8.718551               0
## 40 2020-07-10     01003  48.857143  8.918230               0
## 41 2020-07-11     01003  51.571429  9.440457               0
## 42 2020-07-12     01003  59.000000  9.903577               0
## 43 2020-07-13     01003  64.000000 10.334842               0
## 44 2020-07-14     01003  59.571429  8.921196               0
## 45 2020-07-15     01003  66.000000  7.922708               0
## 46 2020-07-16     01003  66.857143  7.748389               0
## 47 2020-07-17     01003  71.714286  7.546518               0
## 48 2020-07-18     01003  85.000000  7.418840               0
## 49 2020-07-19     01003  91.857143  7.416810               0
## 50 2020-07-20     01003  93.428571  7.473701               0
## 51 2020-07-21     01003  98.285714  7.966899               0
## 52 2020-07-22     01003  96.857143  7.355723               0
## 53 2020-07-23     01003 123.142857  6.977930               0
## 54 2020-07-24     01003 117.714286  6.833789               0
## 55 2020-07-25     01003 120.428571  7.083536               0
## 56 2020-07-26     01003 110.142857  7.285879               0
## 57 2020-07-27     01003 117.571429  7.476056               0
## 58 2020-07-28     01003 104.714286  7.393050               0
## 59 2020-07-29     01003  91.285714  6.987763               0
## 60 2020-07-30     01003  81.000000  6.626541               0
## 61 2020-07-31     01003  84.000000  6.682756               0
## 62 2020-08-01     01003  68.571429  6.462142               0
## 63 2020-08-02     01003  73.571429  6.311417               0
## 64 2020-08-03     01003  61.285714  6.214892               0
## 65 2020-08-04     01003  69.285714  6.607147               0
## 66 2020-08-05     01003  77.857143  6.786898               0
## 67 2020-08-06     01003  58.571429  6.640317               0
## 68 2020-08-07     01003  57.571429  6.277139               0
## 69 2020-08-08     01003  66.285714  5.670363               0
## 70 2020-08-09     01003  54.714286  5.093669               0
## 71 2020-08-10     01003  64.142857  4.641709               0
## 72 2020-08-11     01003  59.428571  4.583945               0
## 73 2020-08-12     01003  56.571429  4.629656               0
## 74 2020-08-13     01003  53.571429  4.489354               0
## 75 2020-08-14     01003  50.857143  4.255767               0
## 76 2020-08-15     01003  43.285714  4.266932               0
## 77 2020-08-16     01003  48.857143  4.282707               0
## 78 2020-08-17     01003  35.142857  4.287946               0
## 79 2020-08-18     01003  34.428571  3.873270               0
## 80 2020-08-19     01003  32.285714  3.521795               0
## 81 2020-08-20     01003  31.714286  3.491025               0
## 82 2020-08-21     01003  27.714286  3.445380               0
## 83 2020-08-22     01003  29.428571  3.262581               0
## 84 2020-08-23     01003  28.428571  3.155857               0
## 85 2020-08-24     01003  29.571429  3.075220               0
## 86 2020-08-25     01003  30.428571  3.269225               0
## 87 2020-08-26     01003  37.571429  3.194299               0
## 88 2020-08-27     01003  39.428571  2.938526               0
## 89 2020-08-28     01003  41.857143  2.783125               0
## 90 2020-08-29     01003  44.142857  3.147181               0
## 91 2020-08-30     01003  54.000000  3.435437               0
## 92 2020-08-31     01003  54.000000  3.613825               0
##    indicator_rise_point
## 1                     0
## 2                     0
## 3                     0
## 4                     0
## 5                     0
## 6                     0
## 7                     0
## 8                     0
## 9                     0
## 10                    0
## 11                    0
## 12                    0
## 13                    0
## 14                    0
## 15                    1
## 16                    0
## 17                    0
## 18                    0
## 19                    0
## 20                    0
## 21                    0
## 22                    0
## 23                    0
## 24                    0
## 25                    0
## 26                    0
## 27                    0
## 28                    0
## 29                    0
## 30                    0
## 31                    0
## 32                    0
## 33                    0
## 34                    0
## 35                    0
## 36                    0
## 37                    0
## 38                    0
## 39                    0
## 40                    0
## 41                    0
## 42                    0
## 43                    0
## 44                    0
## 45                    0
## 46                    0
## 47                    0
## 48                    0
## 49                    0
## 50                    0
## 51                    0
## 52                    0
## 53                    0
## 54                    0
## 55                    0
## 56                    0
## 57                    0
## 58                    0
## 59                    0
## 60                    0
## 61                    0
## 62                    0
## 63                    0
## 64                    0
## 65                    0
## 66                    0
## 67                    0
## 68                    0
## 69                    0
## 70                    0
## 71                    0
## 72                    0
## 73                    0
## 74                    0
## 75                    0
## 76                    0
## 77                    0
## 78                    0
## 79                    0
## 80                    0
## 81                    0
## 82                    0
## 83                    0
## 84                    0
## 85                    0
## 86                    0
## 87                    0
## 88                    0
## 89                    0
## 90                    0
## 91                    0
## 92                    0


Step 4. Plot the smoothed signal with the beginning rise points.

We can see that Drs Visits begins to rise before cases rise. TODO I think we need to tweak our rise point method a bit so we don’t have these “double counting” points on a rise.

plot_signals(drs_summer, "01003", smooth_and_show_increase_point=TRUE, "Drs Visits")

Analysis

TODO on this whole section. Need to rework and add in Vishnu’s analysis.

Doctor Visits

We can plot some of the counties where rises in the doctor visits indicator consistently lead rises in cases.

Examples from the Summer
## Warning: Some inputs were not uniquely matched; returning only the first match
## in each case.

Examples from the Fall
## Warning: Some inputs were not uniquely matched; returning only the first match
## in each case.

Examples of counties that show successes in both Summer and Fall
## Warning: Some inputs were not uniquely matched; returning only the first match
## in each case.

## Warning: Some inputs were not uniquely matched; returning only the first match
## in each case.

##### Fall Frequencies We can also look at the distribution of the frequency of the number of days by which Doctor Visits’ rises lead case rises in successful counties

Summer Frequencies

We can also look at the distribution of the frequency of the number of days by which Doctor Visits’ rises lead case rises in successful counties

Change Healthcare

We can plot some of the counties where rises in the Change Healthcare indicator consistently lead rises in cases.

Examples from the Summer
## Warning: Some inputs were not uniquely matched; returning only the first match
## in each case.

Examples from the Fall
## Warning: Some inputs were not uniquely matched; returning only the first match
## in each case.

Examples of counties that show successes in both Summer and Fall

Fall Frequencies

We can also look at the distribution of the frequency of the number of days by which Change Healthcare rises lead case rises in successful counties ##### Summer Frequencies We can also look at the distribution of the frequency of the number of days by which Change Healthcare rises lead case rises in successful counties

Indicator Combination

We can plot some of the counties where rises in the indicator combination consistently lead rises in cases.

Examples from the Summer

Examples from the Fall

Examples of counties that show successes in both Summer and Fall
Fall Frequencies

We can also look at the distribution of the frequency of the number of days by which Indicator Combination rises lead case rises in successful counties ##### Summer Frequencies We can also look at the distribution of the frequency of the number of days by which Change Healthcare rises lead case rises in successful counties

Note that the counties shown and counted as “successful” here met the following criteria: All indicator rise points were followed by a case rise point within 3 to 14 days (and all case rise points were preceded by an indicator rise point within the same time period). This means that the displayed counties almost always have only one rise point per signal and case (the more rise points the harder it is to meet the criteria).

Performance

Recall and precision.

Conclusions and Limitations